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Abstract 

We  address  the  problem  of  deciding  whether  a  causal  or  probabilistic  query 
is  estimable  from  data  corrupted  by  missing  entries,  given  a  model  of  miss¬ 
ingness  process.  We  extend  the  results  of  Mohan  et  al.  [2013]  by  present¬ 
ing  more  general  conditions  for  recovering  probabilistic  queries  of  the  form 
P(y \x)  and  P(y,x)  as  well  as  causal  queries  of  the  form  P(y\do(x)).  We 
show  that  causal  queries  may  be  recoverable  even  when  the  factors  in  their 
identifying  estimands  are  not  recoverable.  Specifically,  we  derive  graphical 
conditions  for  recovering  causal  effects  of  the  form  P(y\do(x))  when  Y  and 
its  missingness  mechanism  are  not  d-separable.  Finally,  we  apply  our  re¬ 
sults  to  problems  of  attrition  and  characterize  the  recovery  of  causal  effects 
from  data  corrupted  by  attrition. 


1  Introduction 

All  branches  of  experimental  science  are  plagued  by  missing  data.  Improper  handling  of 
missing  data  can  bias  outcomes  and  potentially  distort  the  conclusions  drawn  from  a  study. 
Therefore,  accurate  diagnosis  of  the  causes  of  missingness  is  crucial  for  the  success  of  any  re¬ 
search.  We  employ  a  formal  representation  called  ‘Missingness  Graphs’  (m-graphs,  for  short) 
to  explicitly  portray  the  missingness  process  as  well  as  the  dependencies  among  variables  in 
the  available  dataset  (Mohan  et  al.  [2013]).  Apart  from  determining  whether  recoverabil¬ 
ity  is  feasible  namely,  whether  there  exists  any  theoretical  impediment  to  estimability  of 
queries  of  interest,  m-graphs  can  also  provide  a  means  for  communication  and  refinement 
of  assumptions  about  the  missingness  process.  Furthermore,  m-graphs  permit  us  to  detect 
violations  in  modeling  assumptions  even  when  the  dataset  is  contaminated  with  missing 
entries  (Mohan  and  Pearl  [2014]). 

In  this  paper,  we  extend  the  results  of  Mohan  et  al.  [2013]  by  presenting  general  conditions 
under  which  probabilistic  queries  such  as  joint  and  conditional  distributions  can  be  recov¬ 
ered.  We  show  that  causal  queries  of  the  type  P(y\do(x))  can  be  recovered  even  when  the 
associated  probabilistic  relations  such  as  P(y,x)  and  P{y \x)  are  not  recoverable.  In  partic¬ 
ular,  causal  effects  may  be  recoverable  even  when  Y  is  not  separable  from  its  missingness 
mechanism.  Finally,  we  apply  our  results  to  recover  causal  effects  when  the  available  dataset 
is  tainted  by  attrition. 

This  paper  is  organized  as  follows.  Section  2  provides  an  overview  of  missingness  graphs 
and  reviews  the  notion  of  recoverability  i.e.  obtaining  consistent  estimates  of  a  query, 
given  a  dataset  and  an  m-graph.  Section  3  refines  the  sequential  factorization  theorem 
presented  in  Mohan  et  al.  [2013]  and  extends  its  applicability  to  a  wider  range  of  problems 
in  which  missingness  mechanisms  may  influence  each  other.  In  section  4,  we  present  general 
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Figure  1:  Typical  m-graph  where  Va  =  {S,  A},  Vm  =  {I,  Q},  V*  =  {I*,Q*},  R  =  {Ri,Rq} 
and  U  is  the  latent  common  cause.  Members  of  V0  and  Vm  are  represented  by  full  and  hollow 
circles  respectively.  The  associated  missingness  process  and  assumptions  are  elaborated  in 
appendix  10.1. 


algorithms  to  recover  joint  distributions  from  the  class  of  problems  for  which  sequential 
factorization  theorem  fails.  In  section  5,  we  introduce  new  graphical  criteria  that  preclude 
recoverability  of  joint  and  conditional  distributions.  In  section  6,  we  discuss  recoverability 
of  causal  queries  and  show  that  unlike  probabilistic  queries,  P(y\do(x ))  may  be  recovered 
even  when  Y  and  its  missingness  mechanism  (Ry)  are  not  d-separable.  In  section  7,  we 
demonstrate  how  we  can  apply  our  results  to  problems  of  attrition  in  which  missingness  is  a 
severe  obstacle  to  sound  inferences.  Related  works  are  discussed  in  section  8  and  conclusions 
are  drawn  in  section  9.  Proofs  of  all  theoretical  results  in  this  paper  are  provided  in  the 
appendix. 


2  Missingness  Graph  and  Recoverability 


Missingness  graphs  as  discussed  below  was  first  defined  in  Mohan  et  al.  [2013]  and  we  adopt 
the  same  notations.  Let  G(V, E)  be  the  causal  DAG  where  V  =  V'Uf/Ul/*UR.  V  is  the 
set  of  observable  nodes.  Nodes  in  the  graph  correspond  to  variables  in  the  data  set.  U  is 
the  set  of  unobserved  nodes  (also  called  latent  variables).  E  is  the  set  of  edges  in  the  DAG. 
We  use  bi-directed  edges  as  a  shorthand  notation  to  denote  the  existence  of  a  U  variable 
as  common  parent  of  two  variables  in  kUR.  V  is  partitioned  into  V0  and  Vm  such  that 
V0  C  V  is  the  set  of  variables  that  are  observed  in  all  records  in  the  population  and  Vm  C  V 
is  the  set  of  variables  that  are  missing  in  at  least  one  record.  Variable  A  is  termed  as  fully 
observed  if  X  G  V0,  partially  observed  if  X  G  Vm  and  substantive  if  X  G  V0UVm.  Associated 
with  every  partially  observed  variable  V)  €  Vm  are  two  other  variables  RVi  and  V* .  where 
V*  is  a  proxy  variable  that  is  actually  observed,  and  RVi  represents  the  status  of  the  causal 
mechanism  responsible  for  the  missingness  of  V* ;  formally, 


v*i  =  f{rVi,Vi) 


Vi  if  rVi  =  0 

m  if  rVi  =  1 


(1) 


V*  is  the  set  of  all  proxy  variables  and  R.  is  the  set  of  all  causal  mechanisms  that  are 
responsible  for  missingness.  R  variables  may  not  be  parents  of  variables  in  V  U  U.  We 
call  this  graphical  representation  Missingness  Graph  (or  m-graph).  An  example  of  an 
m-graph  is  given  in  Figure  1  (a)  .We  use  the  following  shorthand.  For  any  variable  X ,  let 
X'  be  a  shorthand  for  X  =  0.  For  any  set  W  C  Vm  U  VQ  U  R,  let  Wr .  WQ  and  Wm  be  the 
shorthand  for  W  Pi  R,  W  fl  V0  and  WnVm  respectively.  Let  Rw  be  a  shorthand  for  Rvmnw 
i.e.  Rw  is  the  set  containing  missingness  mechanisms  of  all  partially  observed  variables  in 
W.  Note  that  Rw  and  Wr  are  not  the  same.  Gx_  and  G ^  represent  graphs  formed  by 
removing  from  G  all  edges  leaving  and  entering  X,  respectively. 

A  manifest  distribution  P(V0,V* ,  R)  is  the  distribution  that  governs  the  available  dataset. 
An  underlying  distribution  P(V0,Vm,  R)  is  said  to  be  compatible  with  a  given  manifest 
distribution  P(V0,V* ,  R)  if  the  latter  can  be  obtained  from  the  former  using  equation  1. 
Manifest  distribution  Pm  is  compatible  with  a  given  underlying  distribution  Pu  if  VA,  X  C 
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Figure  2:  (a)  m-graph  in  which  P(V)  is  recoverable  by  the  sequential  factorization  (b)  & 
(c):  m-graphs  for  which  no  admissible  sequence  exists. 

Vm  and  Y  =  Vm\X,  the  following  equality  holds  true. 

Pm  (R'x,Ry,X*,Y*,V0)  =  Pu  (R'x  ,Rv,X,V0) 

where  R'x  denotes  Rx  =  0  and  Ry  denotes  Ry  =  1.  Refer  Appendix  10.2  for  an  example. 

2.1  Recoverability 

Given  a  manifest  distribution  P(V* ,  VQ,  R)  and  an  m-graph  G  that  depicts  the  missingness 
process,  query  Q  is  recoverable  if  we  can  compute  a  consistent  estimate  of  Q  as  if  no  data 
were  missing.  Formally, 

Definition  1  (Recoverability  (Mohan  et  al.  [2013])).  Given  a  m-graph  G,  and  a  target 
relation  Q  defined  on  the  variables  in  V,  Q  is  said  to  be  recoverable  in  G  if  there  exists  an 
algorithm  that  produces  a  consistent  estimate  of  Q  for  every  dataset  D  such  that  P{D )  is  (1) 
compatible  with  G  and  (2)  strictly  positive 1  over  complete  cases  i.e.  P(Va ,  =  0)  >  0. 

For  an  introduction  to  the  notion  of  recoverability  see,  Pearl  and  Mohan  [2013]  and  Mohan 
et  al.  [2013]. 

3  Recovering  Probabilistic  Queries  by  Sequential  Factorization 

Mohan  et  al.  [2013]  (theorem-4)  presented  a  sufficient  condition  for  recovering  probabilistic 
queries  such  as  joint  and  conditional  distributions  by  using  ordered  factorizations.  However, 
the  theorem  is  not  applicable  to  certain  classes  of  problems  such  as  those  in  longitudinal 
studies  in  which  edges  exist  between  R  variables.  General  ordered  factorization  defined 
below  broadens  the  concept  of  ordered  factorization  (Mohan  et  al.  [2013])  to  include  the  set  of 
R  variables.  Subsequently,  the  modified  theorem  (stated  below  as  theorem  1)  will  permit  us 
to  handle  cases  in  which  R  variables  are  contained  in  separating  sets  that  d-separate  partially 
observed  variables  from  their  respective  missingness  mechanisms  (example:  XALRx\Ry  in 
figure  2  (a)). 

Definition  2  (General  Ordered  factorization).  Given  a  graph  G  and  a  set  O  of  ordered  VUR 
variables  Yi  <  Y2  <  ...  <  Y k,  a  general  ordered  factorization  relative  to  G,  denoted  by  f{0), 
is  a  product  of  conditional  probabilities  f(0)  =  JX  P(Yi\Xi)  where  X j  C  {l)+i,  • . . ,  Yn}  is 
a  minimal  set  such  that  Yj_LL({Yi_|_i, . . .  ,Yn}  \  Xz)\Xi  holds  in  G. 

Theorem  1  (Sequential  Factorization  ).  A  sufficient  condition  for  recoverability  of  a  rela¬ 
tion  Q  defined  over  substantive  variables  is  that  Q  be  decomposable  into  a  general  ordered 
factorization,  or  a  sum  of  such  factorizations,  such  that  every  factor  Qi  =  PfYi \Xf)  satis¬ 
fies,  (1)  YilL(Ryi,  Rx.)\Xi\{Ry.,  Rx.},  ifYi  e  (V0l)Vm)  and  (2)  Z  ^  X »  and  XrC\RXm  =  0 
and  RzALRxi\Xi  ifYi  =  Rz  for  any  Z  £  Vm. 

An  ordered  factorization  that  satisfies  the  condition  in  Theorem  1  is  called  an  admissible 
sequence. 

The  following  example  illustrates  the  use  of  theorem  1  for  recovering  the  joint  distribution. 
Additionally,  it  sheds  light  on  the  need  for  the  notion  of  minimality  in  definition  2. 

1An  extension  to  datasets  that  are  not  strictly  positive  over  complete  cases  is  sometimes  feasi- 
ble(Mohan  et  al.  [2013]). 
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Example  1.  We  are  interested  in  recovering  P(X,Y,Z)  given  the  m-graph  in  Figure  2 
(a).  We  discern  from  the  graph  that  definition  2  is  satisfied  because:  (1)  P(Y\X,Z,Ry)  = 
P(Y\X,Z)  and  (X,Z)  is  a  minimal  set  such  that  Y1L({X,  Z,  Ry}  \  (X,  Z))\(X,  Z),  (2) 
P(X\Ry,  Z )  =  P(X\Ry)  and  Ry  is  the  minimal  set  such  that  XAL({Ry,  Z}  \  Ry)\Ry 
and  (3)  P(Z\Ry)  =  P(Z)  and  0  is  the  minimal  set  such  that  ZALRy\tt).  Therefore, 
the  order  Y  <  X  <  Z  <  Ry  induces  a  general  ordered  factorization  P(X,Y,  Z,  Ry)  = 
P(Y\X ,  Z)P(X\Ry)P(Z)P(Ry).  We  now  rewrite  P{X ,  Y,  Z)  as  follows: 

P(X,Y,Z)  =  Y/r  P(Y,X,Z,Rv)=P(Y\X,Z)P(Z)J2r  P(X\Ry)P(Ry) 

Ky  Ky 

Since  Y ALRy\X,  Z ,  ZlLRz,  XALRx\Ry,  by  theorem  1  we  have, 

P(X,  Y,  Z)  =  P(Y \X,  Z,  R!x,R'y ,  R'z)P(Z\R'z)  V  P(X\R'x,  Ry)P(Ry) 

71  y 

Indeed,  equation  1  permits  us  to  rewrite  it  as: 

P(X,  Y,  Z)  =  P(Y*  | X*,Z*,R!x,  Rfy ,  R’z)P{Z * \R’Z)  V  P(X*\R'x,  Ry)P(Ry) 

1 1  a 

P(X,  Y ,  Z)  is  recoverable  because  every  term  in  the  right  hand  side  is  consistently  estimable 
from  the  available  dataset. 

Had  we  ignored  the  minimality  requirement  in  definition  2  and  chosen  to  factorize 
Y  <  X  <  Z  <  Ry  using  the  chain  rule,  we  would  have  obtained:  P(X,Y,  Z,  Ry)  = 
P{Y\X,  Z ,  Ry)P(X\Z,  Ry)P(Z\Ry)P(Ry)  which  is  not  admissible  since  XAl\rziRx)\Z  does 
not  hold  in  the  graph.  In  other  words,  existence  of  one  admissible  sequence  based  on  an  order 
O  of  variables  does  not  guarantee  that  every  factorization  based  on  O  is  admissible;  it  is  for 
this  reason  that  we  need  to  impose  the  condition  of  minimality  in  definition  2. 

The  recovery  procedure  presented  in  example  1  requires  that  we  introduce  Ry  into  the  order. 
Indeed,  there  is  no  ordered  factorization  over  the  substantive  variables  {X,  Y,Z}  that  will 
permit  recoverability  of  P(X,Y,Z)  in  figure  2  (a).  This  extension  of  Mohan  et  al.  [2013] 
thus  permits  the  recovery  of  probabilistic  queries  from  problems  in  which  the  missingness 
mechanisms  interact  with  one  another. 


4  Recoverability  in  the  Absence  of  an  Admissible  Sequence 


Mohan  et  al.  [2013]  presented  a  theorem  (refer  appendix  10.4)  that  stated  the  necessary  and 
sufficient  condition  for  recovering  the  joint  distribution  for  the  class  of  problems  in  which  the 
parent  set  of  every  R  variable  is  a  subset  of  V0UVm.  In  contrast  to  Theorem  1,  their  theorem 
can  handle  problems  for  which  no  admissible  sequence  exists.  The  following  theorem  gives  a 
generalization  and  is  applicable  to  any  given  semi-markovian  model  (for  example,  m-graphs 
in  figure  2  (b)  &  (c)).  It  relies  on  the  notion  of  collider  path  and  two  new  subsets,  RtPart): 
the  partitions  of  R  variables  and  Mb(R^):  substantive  variables  related  to  i?W,  which  we 
will  define  after  stating  the  theorem. 


Theorem  2.  Given  an  m-graph  G  in  which  no  element  in  Vm  is  either  a  neighbor  of  its 
missingness  mechanism  or  connected  to  its  missingness  mechanism  by  a  collider  path,  P(V) 
is  recoverable  if  no  Mb(R^)  contains  a  partially  observed  variable  X  such  that  Rx  £  R W 
i.e.  \/i,  R^  =  0.  Moreover,  if  recoverable,  P{V)  is  given  by, 


P(V) 


P(V,R  =  0) 

I],;  P(RW  =  0| Mb(RW),  RMb(RW)  =  0) 


In  theorem  2: 

(i)  collider  path  p  between  any  two  nodes  X  and  Y  is  a  path  in  which  every  intermediate 

node  is  a  collider.  Example,  X  Z  < - >  Y . 

(ii)  RPart  =  {RP\  R(2\  ...R(Nl}  are  partitions  of  R  variables  such  that  for  every  element 
Rx  and  Ry  belonging  to  distinct  partitions,  the  following  conditions  hold  true:  (i)  Rx  and 
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Ry  are  not  neighbors  and  (ii)  Rx  and  Ry  are  not  connected  by  a  collider  path.  In  figure  2 
(b):  RPart  =  {R^\R^}  where  R «  =  {RW,RZ},  i?(2)  =  {Rx,Ry} 

(iii)  Mb(R^)  is  the  markov  blanket  of  comprising  of  all  substantive  variables  that  are 
either  neighbors  or  connected  to  variables  in  by  a  collider  path  (Richardson  [2003]).  In 
figure  2  (b):  Mb(R^)  =  {X,Y}  and  Mb(R^)  =  {Z,W}. 

Appendix  10.6  demonstrates  how  theorem  2  leads  to  the  recoverability  of  P(V)  in  figure  2, 
to  which  theorems  in  Mohan  et  al.  [2013]  do  not  apply. 

The  following  corollary  yields  a  sufficient  condition  for  recovering  the  joint  distribution  from 
the  class  of  problems  in  which  no  bi-directed  edge  exists  between  variables  in  sets  R  and 
V0UVm  (for  example,  the  m-graph  described  in  Figure  2  (c)).  These  problems  form  a  subset 
of  the  class  of  problems  covered  in  theorem  2.  Subset  Pasub(R W)  used  in  the  corollary  is 
the  set  of  all  substantive  variables  that  are  parents  of  variables  in  R)l\  In  figure  2  (b): 
Pasub(R W)  =  0  and  Pasub(R W)  =  {Z,  W}. 

Corollary  1.  Let  G  be  an  m-graph  such  that  (i)  \/X  £  Vm  U  Vol  no  latent  variable  is  a 
common  parent  of  X  and  any  member  of  R,  and  (ii)  VU  £  Vm,  Y  is  not  a  parent  of  Ry.  If 
Vi,  Pasub(R W)  does  not  contain  a  partially  observed  variables  whose  missing  mechanism  is 
in  i?W  j.e.  _rM  n  i?pa»„6(P(i))  =  0,  then  P(V)  is  recoverable  and  is  given  by, 

\  _  P(R=0,V) 

1 '  ~  niP{R^=0\Pa^{R£)),RpaSuHRW)=0) 

5  Non-recoverability  Criteria  for  Joint  and  Conditional 
Distributions 

Up  until  now,  we  dealt  with  sufficient  conditions  for  recoverability.  It  is  important  however 
to  supplement  these  results  with  criteria  for  non-recoverability  in  order  to  alert  the  user  to 
the  fact  that  the  available  assumptions  are  insufficient  to  produce  a  consistent  estimate  of 
the  target  query.  Such  criteria  have  not  been  treated  formally  in  the  literature  thus  far.  In 
the  following  theorem  we  introduce  two  graphical  conditions  that  preclude  recoverability. 

Theorem  3  (Non-recoverability  of  P(V)).  Given  a  semi-markovian  model  G,  the  following 
conditions  are  necessary  for  recoverability  of  the  joint  distribution: 

(i)  \/X  £  Vm,  X  and  Rx  are  not  neighbors  and 

(ii)  \/X  £  Vm,  there  does  not  exist  a  path  from  X  to  Rx  in  which  every  intermediate  node 
is  both  a  collider  and  a  substantive  variable. 

In  the  following  corollary,  we  leverage  theorem  3  to  yield  necessary  conditions  for  recovering 
conditional  distributions. 

Corollary  2.  [Non-recoverability  of  P(Y \X)[  Let  X  andY  be  disjoint  subsets  of  substantive 
variables.  P(Y\X)  is  non-recoverable  in  m-graph  G  if  one  of  the  following  conditions  is  true: 

(1)  Y  and  Ry  are  neighbors 

(2)  G  contains  a  collider  path  p  connecting  Y  and  Ry  such  that  all  intermediate  nodes  in  p 
are  in  X . 


6  Recovering  Causal  Queries 

Given  a  causal  query  and  a  causal  bayesian  network  a  complete  algorithm  exists  for  deciding 
whether  the  query  is  identifiable  or  not  (Shpitser  and  Pearl  [2006] ) .  Obviously,  a  query  that 
is  not  identifiable  in  the  substantive  model  is  not  recoverable  from  missing  data.  Therefore, 
a  necessary  condition  for  recoverability  of  a  causal  query  is  its  identifiability  which  we  will 
assume  in  the  rest  of  our  discussion. 

Definition  3  (Trivially  Recoverable  Query).  A  causal  query  Q  is  said  to  be  trivially  recov¬ 
erable  given  an  m-graph  G  if  it  has  an  estimand  (in  terms  of  substantive  variables)  in  which 
every  factor  is  recoverable. 
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Ry  W  Z  Y 
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Figure  3:  m-graph  in  which  Y  and  Ry  are  not  separable  but  still  P(Y\do(Z))  is  recoverable. 


Classes  of  problems  that  fall  into  the  MCAR  (Missing  Completely  At  Random)  and  MAR 
(Missing  At  Random)  category  are  much  discussed  in  the  literature  ((Rubin  [1976]))  be¬ 
cause  in  such  categories  probabilistic  queries  are  recoverable  by  graph-blind  algorithms.  An 
immediate  but  important  implication  of  trivial  recoverability  is  that  if  data  are  MAR  or 
MCAR  and  the  query  is  identifiable,  then  it  is  also  recoverable  by  model-blind  algorithms. 

Example  2.  In  the  gender  wage-gap  study  example  in  Figure  1  (a),  the  effect  of  sex  on 
income,  P(I\do(S)),  is  identifiable  and  is  given  by  P(I\S).  By  theorem  2,  P(S,  X,  Q,  I)  is 
recoverable.  Hence  P(I\do(S))  is  recoverable. 

6.1  Recovering  P(y\do(z))  when  Y  and  Ry  are  inseparable 

The  recoverability  of  P(V)  hinges  on  the  separability  of  a  partially  observed  variable  from  its 
missingness  mechanism  (a  condition  established  in  theorem  3).  Remarkably,  causal  queries 
may  circumvent  this  requirement.  The  following  example  demonstrates  that  P{y\do(z))  is 
recoverable  even  when  Y  and  Ry  are  not  separable. 

Example  3.  Examine  Figure  3.  By  backdoor  criterion,  P(y\do(z))  =  P(y\zi  w)P(w). 
One  might  be  tempted  to  conclude  that  the  causal  relation  is  non-recoverable  because 
P(w,z,y)  is  non-recoverable  (by  theorem  2)  and  P(y\z,w)  is  not  recoverable  (by  corollary 
2).  However,  P(y\do(z))  is  recoverable  as  demonstrated  below: 

P(y\do(z))  =  P(y\do(z),  R'y)  =  ^  P{y\do(z),  w,  Ry)P{w\do(z),  Ry)  (2) 

W 

P(y\do(z),w,  Ry)  =  P(y\z,w,  Ry)  (by  Ride-2  of  do- calculus  (Pearl  [2009]))  (3) 

P(w\do(z),  R'y)  =  P(w\R'y)  (by  Rule-3  of  do-calculus)  )  (4) 

Substituting  (3)  and  (f)  in  (2)  we  get: 

P(y\do{z))  =  'Y^P(y\z,w,R!v)P(w\R'y)  =  ^P(y*\z,w,R'y)P{w\R'y) 


The  recoverability  of  P(y\do(z))  in  the  previous  example  follows  from  the  notion  of  d*- 
separability  and  dormant  independence  [Shpitser  and  Pearl,  2008]. 

Definition  4  (d* -separation  (Shpitser  and  Pearl  [2008])).  Let  G  be  a  causal  diagram.  Vari¬ 
able  sets  X,  Y  are  d*  -separated  in  G  given  Z,  W  (written  X  XTO  Y|Z),  if  we  can  find  sets 
Z,  W,  such  that  X  X  Y\Z  in  Gw,  and  P(y,x\z,do(w))  is  identifiable. 

Definition  5  (Inducing  path  (Verma  and  Pearl  [1991])).  An  path  p  between  X  and  Y  is 
called  inducing  path  if  every  node  on  the  path  is  a  collider  and  an  ancestor  of  either  X  or 
Y. 

Theorem  4.  Given  an  m-graph  in  which  \Vm\  =  1  and  Y  and  Ry  are  connected  by  an 
inducing  path,  P{y\do(x))  is  recoverable  if  there  exists  Z,W  such  thatY  Lw  Ry\Z  and  for 
W  =  W  \  X,  the  following  conditions  hold: 

(1)  YALWi\X,  Z  in  G^  Wi  and 

(2)  P(Wi,  Z\do(X))  and  P(Y\do(Wi),  do(X),  Z,  R'y)  are  identifiable. 

Moreover,  if  recoverable  then, 

P{y\do(x))  =  Y)WuZ  P(Y\do(W),do(X),Z,  R'y)P(Z ,  W1\do{X)) 

We  can  quickly  conclude  that  P{y\do{z))  is  recoverable  in  the  m-graph  in  figure  3  by  verifying 
that  the  conditions  in  theorem  4  hold  in  the  m-graph. 
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Figure  4:  (a)  m-graphs  in  which  P(y\do{x))  is  not  recoverable  (b)  m-graphs  in  which 
P(y\do(x))  is  recoverable. 


7  Attrition 

Attrition  (i.e.  participants  dropping  out  from  a  study/experiment),  is  a  ubiquitous  phe¬ 
nomenon,  especially  in  longitudinal  studies.  In  this  section,  we  shall  discuss  a  special  case 
of  attrition  called  ‘Simple  Attrition’  (Garcia  [2013]).  In  this  problem,  a  researcher  conducts 
a  randomized  trial,  measures  a  set  of  variables  (X,Y,Z)  and  obtains  a  dataset  where  outcome 
(Y)  is  corrupted  by  missing  values  (due  to  attrition).  Clearly,  due  to  randomization,  the 
effect  of  treatment  (X)  on  outcome  (Y),  P(y\do(x)),  is  identifiable  and  is  given  by  P(Y\X). 
We  shall  now  demonstrate  the  usefulness  of  our  previous  discussion  in  recovering  P(y\do(x)). 
Typical  attrition  problems  are  depicted  in  figure  4.  In  Figure  4  (b)  we  can  apply  theorem  1 
to  recover  P(y\do(x))  as  given  below:  P(Y\X)  =  '^2Z  P(Y*\X,  Z,  R’y)P(Z\X).  In  Figure  4 
(a),  we  observe  that  Y  and  Ry  are  connected  by  a  collider  path.  Therefore  by  corollary  2, 
P(Y\X)  is  not  recoverable;  hence  P(y\do(x))  is  also  not  recoverable. 

7.1  Recovering  Joint  Distributions  under  simple  attrition 

The  following  theorem  yields  the  necessary  and  sufficient  condition  for  recovering  joint  dis¬ 
tributions  from  semi-markovian  models  with  a  single  partially  observed  variable  i.e.  |Vm|  =  1 
which  includes  models  afflicted  by  simple  attrition. 

Theorem  5.  Let  Y  £  Vm  and  \Vm\  =  1.  P(V)  is  recoverable  in  m-graph  G  if  and  only 
if  Y  and  Ry  are  not  neighbors  and  Y  and  Ry  are  not  connected  by  a  path  in  which  all 
intermediate  nodes  are  colliders.  If  both  conditions  are  satisfied,  then  P(V)  is  given  by, 
P(V)  =  P(Y\Vo,Ry  =  0)P(Vo) 

7.2  Recovering  Causal  Effects  under  Simple  Attrition 

Theorem  6.  P(y\do(x))  is  recoverable  in  the  simple  attrition  case  (with  one  partially  ob¬ 
served  variable)  if  and  only  ifY  and  Ry  are  neither  neighbors  nor  connected  by  an  inducing 
path.  Moreover,  if  recoverable, 

P(Y\X)  =  J2  p(Y* \X,  Z,  R'y)P(Z\X)  (5) 

Z 

where  Z  is  the  separating  set  that  d-separates  Y  from  Ry. 

These  results  rectify  prevailing  opinion  in  the  available  literature.  For  example,  according 
to  Garcia  [2013]  (Theorem-3),  a  necessary  condition  for  non-recoverability  of  causal  effect 
under  simple  attrition  is  that  X  be  an  ancestor  of  Ry.  In  Figure  4  (a),  A  is  not  an  ancestor 
of  Ry  and  still  P(Y|X)  is  non-recoverable  (  due  to  the  collider  path  between  Y  and  Ry  ). 

8  Related  Work 

Deletion  based  methods  such  as  listwise  deletion  that  are  easy  to  understand  as  well  as 
implement,  guarantee  consistent  estimates  only  for  certain  categories  of  missingness  such  as 
MCAR  (Rubin  [1976]).  Maximum  Likelihood  method  is  known  to  yield  consistent  estimates 
under  MAR  assumption;  expectation  maximization  algorithm  and  gradient  based  algorithms 
are  widely  used  for  searching  for  ML  estimates  under  incomplete  data  (Lauritzen  [1995], 
Dempster  et  al.  [1977],  Darwiclie  [2009],  Roller  and  Friedman  [2009]).  Most  work  in  machine 
learning  assumes  MAR  and  proceeds  with  ML  or  Bayesian  inference.  However,  there  are 
exceptions  such  as  recent  work  on  collaborative  filtering  and  reconnnender  systems  which 
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develop  probabilistic  models  that  explicitly  incorporate  missing  data  mechanism  (Marlin 
et  al.  [2011],  Marlin  and  Zemel  [2009],  Marlin  et  al.  [2007]). 

Other  methods  for  handling  missing  data  can  be  classified  into  two:  (a)  Inverse  Probability 
Weighted  Methods  and  (b)  Imputation  based  methods  (Rothman  et  al.  [2008]).  Inverse 
Probability  Weighing  methods  analyze  and  assign  weights  to  complete  records  based  on 
estimated  probabilities  of  completeness  (Van  der  Laan  and  Robins  [2003],  Robins  et  al. 
[1994]).  Imputation  based  methods  substitute  a  reasonable  guess  in  the  place  of  a  missing 
value  (Allison  [2002])  and  Multiple  Imputation  (Little  and  Rubin  [2002])  is  a  widely  used 
imputation  method. 

Missing  data  is  a  special  case  of  coarsened  data  and  data  are  said  to  be  coarsened  at 
random  (CAR)  if  the  coarsening  mechanism  is  only  a  function  of  the  observed  data  (Heitjan 
and  Rubin  [1991]).  Robins  and  Rotnitzky  [1992]  introduced  a  methodology  for  parameter 
estimation  from  data  structures  for  which  full  data  has  a  non-zero  probability  of  being  fully 
observed  and  their  methodology  was  later  extended  to  deal  with  censored  data  in  which 
complete  data  on  subjects  are  never  observed  (Van  Der  Laan  and  Robins  [1998]). 

The  use  of  graphical  models  for  handling  missing  data  is  a  relatively  new  development. 
Daniel  et  al.  [2012]  used  graphical  models  for  analyzing  missing  information  in  the  form  of 
missing  cases  (due  to  sample  selection  bias).  Attrition  is  a  common  occurrence  in  longitu¬ 
dinal  studies  and  arises  when  subjects  drop  out  of  the  study  (Twisk  and  de  Vente  [2002], 
Shadisli  [2002])  and  Garcia  [2013]  analysed  the  problem  of  attrition  using  causal  graphs. 
Thoennnes  and  Rose  [2013]  cautioned  the  practitioner  that  contrary  to  popular  belief,  not 
all  auxiliary  variables  reduce  bias.  Both  Garcia  [2013]  and  Thoemmes  and  Rose  [2013] 
associate  missingness  with  a  single  variable  and  interactions  among  several  missingness 
mechanisms  are  unexplored. 

Mohan  et  al.  [2013]  employed  a  formal  representation  called  Missingness  Graphs  to  depict 
the  missingness  process,  defined  the  notion  of  recoverability  and  derived  conditions  under 
which  queries  would  be  recoverable  when  datasets  are  categorized  as  Missing  Not  At  Random 
(MNAR) .  Tests  to  detect  misspecifications  in  the  m-graph  are  discussed  in  Mohan  and  Pearl 
[2014], 

9  Conclusion 

Graphical  models  play  a  critical  role  in  portraying  the  missingness  process,  encoding  and 
communicating  assumptions  about  missingness  and  deciding  recoverability  given  a  dataset 
afflicted  with  missingness.  We  presented  graphical  conditions  for  recovering  joint  and  con¬ 
ditional  distributions  and  sufficient  conditions  for  recovering  causal  queries.  We  exemplified 
the  recoverability  of  causal  queries  of  the  form  P(y\do(x))  despite  the  existence  of  an  in¬ 
separable  path  between  Y  and  Ry,  which  is  an  insurmountable  obstacle  to  the  recovery  of 
P(Y).  We  applied  our  results  to  problems  of  attrition  and  presented  necessary  and  sufficient 
graphical  conditions  for  recovering  causal  effects  in  such  problems. 
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10  Appendix 

10.1  Missingness  Process  in  Figure  1 

Figure  1  Missingness  Graph  depicting  the  missingness  process  in  a  hypothetical  (job-specific) 
gender  wage  gap  study  that  measured  the  variables:  sex  (S),  work  experience (X),  qualifica¬ 
tion^)  and  income(I).  Fully  observed  and  partially  observed  variables  are  represented  by 
filled  and  hollow  nodes  respectively.  While  sex  and  work  experience  were  found  to  be  fully 
observed  in  all  records  i.e.  V0  =  {S',  X},  qualification  and  income  were  found  to  be  missing 
in  some  of  the  records  i.e.  Vm  =  {Q,I}.  Rq  and  Rj  denote  the  causes  of  missingness  of 
Q  and  I  respectively  and  are  assumed  to  be  independent  of  S,Q,I  and  X.  The  assumptions 
in  the  model  are:  (1)  women  are  likely  to  be  less  qualified  and  experienced  than  men,  (2) 
income  is  determined  by  qualification  and  job  experience  of  the  candidate,  and  (3)  missing¬ 
ness  in  Q  and  I  are  correlated,  caused  by  unobserved  common  factors  such  as  laziness  or 
resistance  to  respond. 


10.2  Testing  compatibility  between  underlying  and  manifest  distributions 

Example  4.  Let  the  incomplete  dataset  contain  two  partially  observed  variables,  Z  and 
W .The  tests  for  compatibility  between  manifest  distribution:  Pm(Z* ,W* ,  Rz,  Rw)  and  the 
underlying  distribution:  PU(Z,W,  RZ1  Rw )  are: 

Case-1:  Let  X  =  { Z ,  W},  then  Y  =  Vm  \  X  =  {} 

Pm(Z*  =  z,W*  =w,Rz  =  0 ,RW  =  0)  =  PU(Z  =  z,  W  =  w,Rz  =  0,  Rw  =  0 )Mz,w 

Case-2:  Let  X  =  {Z},  then  Y  =  {W} 

Pm{Z*  =  z,w*  =  to,  Rz  =  0  ,RW  =  1)  =  J2wpu(z  =  z,w,Rz  =  0,  Rw  =  1  )\/z 
Case-3:  Let  X  =  {W},  then  Y  =  {Z} 

Pm(Z*  =  to,  W*  =  w,Rz  =  l,Rw  =  0)  =  YsZ  pu(z,  W  =  w,  Rz  =  1,  Rw  =  0 )\/w 

Case-4:  Let  X  =  {},  then  Y  =  {Z,  W} 

Pm{Z*  =  TO,  W*  =  TO,  Rz  =  1,RW  =  1)  =  J2z,w  pu{z,w,Rz  =  1,  Rw  =  1) 


10.3  Proof  of  theorem  1 

Proof,  follows  from  Theorem-1  in  Mohan  et  al.  [2013]  (restated  below  as  theorem  7)  noting 
that  ordered  factorization  is  one  specific  form  of  decomposition.  □ 

Theorem  7  (Mohan  et  al.  [2013]).  A  query  Q  defined  over  variables  in  V0  U  Vm  is  recov¬ 
erable  if  it  is  decomposable  into  terms  of  the  form  Qj  =  P(Sj\Tj)  such  that  Tj  contains  the 
missingness  mechanism  Rv  =  0  of  every  partially  observed  variable  V  that  appears  in  Qj. 


10.4  Recovering  P(V)  when  parents  of  R  belong  to  F0U  Vm 


Theorem  8  (Recoverability  of  the  Joint  P(V)  (Mohan  et  al.  [2013])).  Given  a  m-graph 
G  with  no  edges  between  the  R  variables  and  no  latent  variables  as  parents  of  R  variables, 
a  necessary  and  sufficient  condition  for  recovering  the  joint  distribution  P{V)  is  that  no 
variable  X  be  a  parent  of  its  missingness  mechanism  Rx-  Moreover,  when  recoverable, 
P(V)  is  given  by 


P(v) 


P(R  =  0,  v) 

IL  p(Ri  =  0|pa°.,pa™,Rpam  =  0) 


(6) 


where  Pa°.  C  Va  and  Pa™  C  Vm  are  the  parents  of  Ri. 


Example  5.  We  wish  to  recover  P(X,  Y,  Z)  from  the  m-graph  in  Figure  1  (a).  An  enumer¬ 
ation  of  various  orderings  will  reveal  that  none  of  the  orders  are  admissible.  Nevertheless, 
using  theorem  8,  we  can  recover  the  joint  probability  as  given  below: 


P(X,Y,Z) 


P(R'x,R'R'z,X,Y,Z) 


P(R'Z\X,  R'X)P(R'X\Y,  R'y)P(R'y\Z ,  R’z) 
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*6—0*  6 


Figure  5:  m-graph  in  which  joint  distribution  is  recoverable. 


10.5  Proof  of  Theorem  2 


Proof. 


P{V) 


P(R  =  0,  V) 

P(R  =  0|U) 

P(R  =  0,  V) 

P(R.A)  =  0,R(2)  =  0,...RN  =  0\V) 


Mb(R^)  d-separates  R W  from  all  variables  that  are  not  in  u  Mb(R^)  i.e. 

R^AL({R,V}-{R^,Mb(R^)})\Mb(R^)  .  Hence, 


Ptv)  = _ p(R  =  ^) _ 

n,P(RV=0\Mb(RW)) 

Using  RW  n  RMb(R «))  =  0  and  R^AL{{R,  V}  -  {#«,  Mb{R^)})\Mb(R^)  we  get, 


P{V) 


P(R  =  0,  V) 

IL  P(RM  =  01  Mb(flW),  =  0) 


Now  we  can  directly  apply  equation  1  and  express  P{V)  in  terms  of  quantities  estimable 
from  the  available  dataset.  Therefore,  P(V)  is  recoverable.  □ 


10.6  Example:  Recoverability  by  Theorem  2 


Example  6.  P(X,  Y,  Z ,  W )  is  the  query  of  interest  and  Figure  2  (b)  depicts  the  missingness 
process  and  identifies  the  sets  RPart  and  Mb(R^).  A  quick  inspection  reveals  that  no  ad¬ 
missible  sequence  exists.  However,  notice  that  CI\  :  R^ AL(R^ , Mb(R^))\Mb(R^)  and 
CI2  :  R^AL(R(-1\  Mb(R^))\Mb(R^)  hold  in  the  m-graph.  We  exploit  these  independen¬ 


cies  to  recover  the  joint  distribution  as  detailed  below: 
p(  v  V  V  TTU  —  P(R=0,X,Y,Z,W)  _  P(R=0,X,Y,Z,W) 

UA,r,^,  VV  >  —  P(R=0\X,Y,Z,W)  —  P(RM=0,RW=0\X,Y,Z,W) 

P(R=0,X,Y,Z,W)  /'TT„A„„r'T a  r<r  \ 

~  P(i?(i)=o|.Y,y,fi(2)=o)P(fl(2>=o|z,iv,fl(1)=o)  (usm9LIi  ana  p 

Pty)  =  P(Rw=0,Rz=0\X* ,Y* ,Ra:Lo,Ry=0)P(Rn:=0,Rl=0\Z‘- ,W* ,Rz=0,Rw=0)  (RV  e9ua^on  V 


10.7  Proof  of  Corollary  1 


Proof. 


P(V) 


P{R  =  0,  V) 

P(R  =  0|U) 

P{R  =  0,  V) 

~p{rwTr^ \ZiW\vj 


11 


Since  Pasub(R^'>)  C  V  d-separates  Ri  from  all  the  other  variables  in  (V  U  R)  \  (i?W  u 
Pa3ub(R «))  ,  we  get 

Pry)  = _ P{R  =  Q,V) _ 

UiP(R(i)  =  0\Pasub(RW)) 

Using  flW  niZPo.„fc(fl(())  =  0  and  U}  -  {R«,  Pa™6(R«)})|Pas“b(R«)  we  get, 

P(V)  = _ P(R  =  0,V) _ 

I!?:  P(R(i)  =  0| Pa*ub(RM),  RPasUb{R(i))  =  0) 

□ 


10.8  Proof  of  Theorem  3 

We  will  be  using  the  following  lemma  (stated  and  proved  in  Mohan  et  al.  [2013]  (Supple¬ 
mentary  materials))  in  our  proof. 

Lemma  1.  If  a  target  relation  Q  is  not  recoverable  in  m-graph  G,  then  Q  is  not  recoverable 
in  the  graph  G’  resulting  from  adding  a  single  edge  to  G. 

Proof.  Non-recoverability  of  P(V)  when  X  is  a  parent  of  Rx  has  been  proved  in  Mohan 
et  al.  [2013].  If  P(V)  is  non-recoverable  when  G  contains  subgraph  Gi  :  X  — >  Rx,  then 

P{V)  is  non-recoverable  when  G  contains  subgraph  G2  :  X  < - U - >  Rx  since,  (a)  Gi 

and  Gi  are  equivalent  models  and  (b)  we  are  dealing  with  recoverability  of  a  probabilistic 
query.  Nevertheless,  a  detailed  proof  by  construction  follows. 

Mi  and  M2  are  two  models  in  which  variables  U,  X  and  Rx  are  binary  and  U  is  a  fair  coin. 
In  Mi,  X  =  0  and  Rx  =  u  and  in  M2,  X  =  u  and  Rx  =  u.  Notice  that  although  the  two 
models  agree  on  the  manifest  distribution,  they  disagree  on  the  query  P(X).  Hence  P(X) 

is  non-recoverable  in  A  < - U - >  Rx.  Using  Lennna-1  (Refer  appendix),  we  can 

conclude  that  P(V)  is  non-recoverable  in  any  m-graph  in  which  X  and  Rx  are  connected 
by  a  bi-directed  edge. 


X 


u2 


4 


34 


Rx 


Figure  6:  An  m-graph  in  which  P(X,  Z)  is  not-recoverable  where  Z  =  {Z1;  Z2, ...,  Z *.}.  X  is 
partially  observed,  all  Z  variables  are  fully  observed,  parents  of  Zt  are  t/,_i  and  JJi,  parent 
of  X  is  U0  and  parent  of  Rx  is  Uk- 


Given  the  m-graph  in  Figure  6  we  will  now  prove  that  P(X,  Z\,  Zi-.-Z *,)  is  non-recoverable. 
Let  M3  and  M4  be  two  models  such  that  all  the  variables  are  binary,  all  the  U  variables  are 
fair  coins,  X  =  t/0,  Rx  =  Uk  and  Zt  =  [/,_-|  ®  Ui,  1  <  *  <  k.  In  M3,  Zk  =  Uk~i  and  in  M4, 
Zk  =  Uk- 1  ®  Uk-  Both  models  yield  the  same  manifest  distribution.  However,  they  disagree 
on  the  query  P(X,  Z\,  Zi-.-Zk).  For  instance,  in  M3,  P(X  =  0,  Z  =  0,  Rx  =  1)  >  0  where 

as  in  M4,  P(X  =  0,  Z  =  0,  Rx  =  1)  =  0.  Therefore  in  M4,  P(X  =  0,Z  =  0)  =  P(X  = 

0  ,Z  =  0  ,RX  =  0)  and  in  M3,  P(X  =  0,  Z  =  0)  =  P(X  =  0,  Z  =  0,  Rx  =  0)  +  P(X  = 

0,  Z  =  0,  Rx  =  1).  Hence  in  the  m-graph  in  figure  6,  the  joint  distribution  P(X,Z)  is 

non-recoverable.  Using  lemma  1,  we  can  conclude  that  joint  distribution  is  non-recoverable 
in  any  m-graph  which  has  a  bi-directed  path  from  any  partially  observed  variable  X  to  its 
missingness  mechanism  Rx.  □ 

10.9  Proof  of  Corollary  2 

Proof.  Let  |Um|  =  1  and  Yj  €  Y  be  the  only  partially  observed  variable.  Let  G'  be  the 
subgraph  containing  all  variables  in  X  U  Y  U  {RVl ,  Yf}.  We  know  that  if  (1)  or  (2)  are  true, 
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then,  (i)  P(X,Y)  is  not  recoverable  in  G'  and  (ii)  P{X)  is  recoverable  in  G' .  Therefore, 
P{Y\X)  =  Pp(’x)  is  not  recoverable  in  G'  and  hence  by  lemma  1,  not  recoverable  in  G.  □ 

10.10  Proof  of  Theorem  4 

Proof.  P(Y\do{X))  =  P(Y\Z,  W',  do(X))P(Z,  W'\do{X)) 

If  condition  1  holds,  then  by  Rule-2  of  do-calculus  (Pearl  [2009])  we  have: 

P{Y\Z,  W\  do{X))  =  P{Y\Z ,  do(X),do(W')) 

Since  Y  i 

W  Ry\Z, 

P(Y\Z,  do(X),  do(W'))  =  P{Y\Z,do{X),do{W'),R'y) 

=  P(Y*\Z,  do(X),  do(W'),  R'y) 

Therefore,  P(y\do(x))  is  recoverable.  □ 

10.11  Proof  of  Theorem  5 

Proof,  (sufficiency)  Whenever  (1)  and  (2)  are  satisfied,  Y YlRv\V0  holds.  Hence,  P(V)  which 
may  be  written  as  P(y|Vb)P(Vo)  can  be  recovered  as  P{Y*\Vq.  Ry  =  O)P(Vo)- 
(necessity)  follows  from  theorem  2.  □ 

10.12  Proof  of  Theorem  6 

Proof,  (sufficiency)  Under  simple  attrition,  all  paths  to  Ry  from  Y  containing  X  are  blocked 
by  X.  Therefore,  when  both  conditions  specified  in  the  theorem  are  satisfied,  it  implies  that 
Y  and  Ry  are  separable.  Given  that  Z  is  any  separator  between  Y  and  Ry,  P(Y\X)  may 
be  recovered  as  Yf,zP(Y* \X,  Z,  R'y)P(Z\X). 

(necessity)  follows  from  theorem  2  □ 
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